Fast and Space-optimal Low-rank Factorization in the Streaming Model With Application in Differential Privacy

نویسنده

  • Jalaj Upadhyay
چکیده

In this paper, we consider the problem of computing a low-rank factorization of an m× n matrix in the general turnstile update model. We consider both the private and non-private setting. 1. In the non-private setting, we give a space-optimal algorithm that computes a low-rank factorization. Our algorithm maintains three sketches of the matrix instead of five as in Boutsidis et al. (STOC 2016). Our algorithm takes Õ(1) time to update the sketch and computes the factorization in time linear in the sparsity and the dimensions of the matrix. 2. In the private setting, we study low-rank factorization in the framework of differential privacy and under turnstile updates. We give two algorithms with respect to two levels of privacy. Both of our privacy levels are stronger than earlier studied privacy levels, namely that of Blocki et al. (FOCS 2012), Dwork et al. (STOC 2014), Hardt and Roth (STOC 2012, STOC 2013), and Hardt and Price (NIPS 2014). (a) In our first level of privacy, Priv1, we consider two matrices as neighboring if their difference has a form uv for some unit vectors u and v. Our private algorithm with respect to Priv1 matches the optimal space bound up to a logarithmic factor and is optimal in the terms of the additive error incurred. The algorithm is also efficient and takes time linear in the input sparsity of the matrix and quadratic in min {m,n}. Our bound quantitatively improve the result of Hardt and Roth (STOC 2012) by a factor of √ k log(1/δ) when m ≤ n, a scenario considered by Hardt and Roth (STOC 2012). (b) Our second level, Priv2, generalizes Priv1. In Priv2, we consider two matrices as neighboring if their difference has unit Frobenius norm. Our private algorithm with respect to Priv2 is computationally more efficient than our first algorithm – it usesO(log(m+n)) time to update and computes the factorization in time linear in the input sparsity and the dimensions of the matrix. This algorithm incurs optimal additive error and uses optimal space when n m. ∗Research supported by NSF award IIS-1447700. i ar X iv :1 60 4. 01 42 9v 3 [ cs .D S] 2 0 M ay 2 01 6

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Differentially Private Linear Algebra in the Streaming Model

The focus of this paper is a systematic study of differential privacy on streaming data using sketchbased algorithms. Previous works, like Dwork et al. (ICS 2010, STOC 2010), explored random sampling based streaming algorithms. We work in the well studied streaming model of computation, where the database is stored in the form of a matrix and a curator can access the database row-wise or column...

متن کامل

On Low-Space Differentially Private Low-rank Factorization in the Spectral Norm

Low-rank factorization is used in many areas of computer science where one performs spectral analysis on large sensitive data stored in the form of matrices. In this paper, we study differentially private low-rank factorization of a matrix with respect to the spectral norm in the turnstile update model. In this problem, given an input matrix A ∈ Rm×n updated in the turnstile manner and a target...

متن کامل

Customer Order Scheduling with Job-Based Processing and Lot Streaming In A Two-Machine Flow Shop

This paper considers a customer order scheduling (COS) problem in which each customer requests a variety of products processed in a two-machine flow shop. A sequence-independent attached setup for each machine is needed before processing each product lot. We assume that customer orders are satisfied by the job-based processing approach in which the same products from different customer orders f...

متن کامل

Evaluating the Quality of Optimal Privacy in the Study Spaces of Libraries and its Impact On the Satisfaction Rates of Consulting Individuals (Case Study : Public Library of Qazvin)

Privacy is one of the essential needs of the human being. And the balance between privacy and social interactions between individuals are influenced by the architectural elements enriched by cultural values of each society which would lead to a sense of satisfaction in environment as well. The scope of environmental psychology is on the relationship between human and the his/ her environments; ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.01429  شماره 

صفحات  -

تاریخ انتشار 2016